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1. INTRODUCTION 

The principle purpose of the biometric is the identification or verification of a person depending on 
behavioural or physical unique characteristics in order to identify her or his identity. The automatic systems 
of identity verification by using ear images are considered a significant research field by the biometric 
community. Generally, the ear images which are used in such systems can be taken from video footage or 
profile headshots. The acquisition manner is nonintrusive, contactless, as well as it does not rely on the 
person's cooperation the needed to recognize [1]. Therefore, the technology of ear recognition shares 
similarities with other biometric modalities which are based on images. Also, the ear biometrics has an 
appealing property that is its distinctiveness, where recent studies have proved empirically that there are 
particular features of the ear which are recognizably different for identical twins [2]. This leads to having 
significant implications which are related to security applications and places ear images on the same level 
with modalities of the epigenetic biometric (e.g., the iris). Besides, ear images can be presented as 
supplements to other biometric modalities in systems of automatic recognition, where these ear images can 
give identity signals if the other information is unavailable or even unreliable. For instance, the technology of 
face recognition in monitoring applications may face difficulties in profile faces. Consequently, the ear 
images can be provided as a source of information for identifying persons in the monitoring footage. 
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In this regard, according to the research studies of ear recognition application [3]-[5], it is obvious 
the importance of ear recognition technology and its potential possibilities that can be provided to biometric 
systems. Also, the study in [6] has stated that the ear images are certainly unique enough to recognise a 
person and such these images can be used effectively as a biometric feature. Generally, the right and left ears 
of the human are similar that makes techniques of ear recognition systems perform efficiently [7]. Figure 1 
shows the external structure of the ear. The appearance of the ear structural patterns (e.g., concha, helix, crus of 
helix, lobe, and others) have a different shape and different position for each person. Furthermore, ears have 
specific advantages as compared to other biometric modalities, where according to the research study in [8], the 
ear structure is constant in the age range between 8 to 70 years old. Also, human ears do not show any 
different appearance with expressions (e.g., sad or happy) unlike techniques of face biometric. 
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Figure 1. The external ear structure [9] 


Nowadays, applications of ear recognition systems consider one of the topics that witnessed high 
interest by researchers, where several techniques and methods have proposed and also have provided many 
ear image databases in order to train and test systems of this application [10], [11]. However, although there 
are many research attempts which are conducted on the ear biometrics, there is one commercial system only 
that is presently available that exploits ear biometrics identification [9]. Therefore, the technologies of 
commercial ear recognition are limited that leads to open issues and challenges which have not been 
appropriately addressed yet. Moreover, machine learning (ML) algorithms can present tools and techniques 
for classifying and distinguishing between two or more classes [12]-[14]. Also, these algorithms have proved 
their efficiency and effectiveness in different domains such as voice pathology detection [15]-[18], vehicle 
detection [19], identification of spam emails [20], images classification in the medical domain [21], [22], 
detection of conflict flows in SDN [23], and language identification [24]-[26]. Furthermore, these algorithms 
have used efficiently and as a major part in the ear recognition systems [27]-[29]. The main purpose of using 
ML algorithms is to classify the images and then create a system of ear recognition that is able to classify and 
recognize the ear images. Many methods and techniques have proposed in the field of ear recognition. 


2. RELATED WORK 

Here, we will review some of the state-of-the-art for systems of ear recognition. The study in [30] is 
used a genetic algorithm for ear recognition application. The genetic algorithm is applied to remove 
unnecessary features as well as a feature selection by choosing the best chromosome. The local and global 
features have combined in order to extract unique features of the ear images. The global features have been 
extracted by using gabor-zernike operator (GZO), while the local features have been extracted by using local 
phase quantization (LPQ). In addition, the quality of ear images has been improved by using contrast-limited 
adaptive histogram equalization (CLAHE) technique in the pre-processing step. In terms of the classification 
process, the nearest neighbour classifier is used to classify persons through their ear images. Furthermore, 
three different databases of ear images have used to evaluate the proposed system, these image databases 
named IIT125, USTB-1, and IIT221. The results of this method have shown that the average accuracy can be 
reached up to 99.2% for HIT125 database, 100% for USTB-1 database, and 97.13% for IIT221 database. 
However, this method has been evaluated in terms of accuracy only, where there are other important 
measurements which can be used to evaluate the system such as precision, recall, specificity, and f-measure. 

A modern system has proposed in [31]. This method is presented a new system for ear recognition 
by using a segmentation adaptive approach runge-kutta (AARK) segmentation. AARK segmentation 
technique is mainly applied to identify the objects and limits of ear images. Also, the AARK can be increased 
the segmentation speed and it presents good shape connectivity. Besides, a classifier called classification and 
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regression tree (CART) is used to classify the ear images, and discrete wavelet transform (DWT) is used as 
feature extraction technique in order to extract features from the images. The images in this study have 
converted from 2 dimensional (2D) into 1D. Furthermore, this system contains processes of information 
normalization, ring projection, and pre-processing. The process of pre-processing is applied in order to 
change the grayscale image into binaries. Subsequently, these binary values will be provided as input to ring 
projection. In this case, the pixels of an image are classified into white or black based on the chosen threshold 
value, where white refers to the values which are higher than the threshold value, and black refers to the 
values which are lesser than the threshold value. The purpose of ring projection is to convert images from 2D 
to 1D. While the process of information normalization is to modify or normalize the pixels of the grayscale 
images. This system has evaluated by various evaluation measures such as accuracy, precision, recall, F- 
measure and others. According to the results of this method have shown that the highest achieved sensitivity 
of 95.45%, 97.69% precision, and 96.55% F-measure. However, the highest obtained accuracy of this 
method is still not encouraging. 

Another research paper has presented in [29] to build a system of ear recognition application. In this 
method, the extreme learning machine (ELM) has used for classification part to identify the ear images of the 
users. The number of hidden neurons for ELM classifier was 10,000. Additionally, there were two feature 
extraction techniques have used to extract features from ear images, these features named local binary pattern 
(LBP), and histograms of oriented gradients (HOG). These two types of features are considered much known 
and have effective performance in several domains of pattern recognition. In this study, the USTB database is 
used and it has 180 samples of ear images for 60 subjects of teachers and students. The results of this method 
have shown that the performance of ELM based on HOG features is slightly better than the performance of 
ELM based on LBP features, where the achieved accuracy of HOG-ELM was 99.86%, while the achieved 
accuracy of LBP-ELM was 99.59%. However, this method has been evaluated by the accuracy only, where 
other measures have ignored. Moreover, the number of hidden neurons for ELM was large which might lead 
to consume more time and memory space. 

Furthermore, there are three different models of convolutional neural networks (CNNs) have been 
used for the recognition of ear images in [32]. The CNNs have used for both parts, representation of ear 
images, and also for the classification part. The models of CNNs that used in this method are VGG-16, 
AlexNet, and GoogLeNet. The VGG-16 model has 16 layers and the AlexNet model contains 
5 convolutional layers and 3 totally connected layers. While the GoogLeNet model contains 22 layers. 
Besides, domain adaptation is applied by using two stages of fine-tuning for CNNs. In the first stage, the 
authors have created a database of ear images from the multi-PIE face database. In this regard, the pre-trained 
CNN models have fine-tuned on this new ear image database. In the second stage, the process of fine-tuning has 
been performed on the UERC database as a target database. Also, the authors have combined the models of 
CNNs in order to improve the performance of this method with respect to accuracy. This method has evaluated 
the performance of the classification part in terms of the impact of ear image quality, aspect ratio and intensity 
level, data augmentation and alignment. The achieved accuracy result of this method is reached up to 99.71%. 
However, alignment has not shown any improvements in the classifier performance in terms of accuracy. 

From the studies, we can observe some drawbacks such as most systems have not been evaluated in 
terms of execution time and other evaluation measurements. Moreover, the results of some systems in such a 
field are not encouraging. Additionally, the RF classifier has not been widely used and investigated in the ear 
recognition domain. Therefore, this study presents the RF classifier in the human ear recognition from 
images. Furthermore, the proposed method is evaluated in terms of execution time and many evaluation 
measurements. 


3. MATERIALS AND RESEARCH METHOD 

In this work, a system of ear recognition has been created by using HOG features and random forest 
(RF) classifier. The performance of this system has evaluated by using various evaluation measurements in 
order to evaluate the efficiency and effectiveness of its performance properly. Moreover, this system consists 
of three main stages. The first stage denotes to the ear image database used in this system. The second stage 
denotes to the feature extraction technique that has used to extract features from ear images. While, the third 
stage denotes to the classification part, where RF technique is used as a classifier to identify the ear images of 
users. Figure 2 shows the flowchart of the proposed system. Furthermore, these three stages will be explained 
in the following subsections, respectively. 


3.1. Ear image database 

In this study, the ear images have been collected from [11]. This database is called IITD II that has 
been produced by the Indian Institute of Technology in Delhi, India. Furthermore, the HTD II database 
contains 793 samples of ear images with 221 classes. All ear images of this database have been taken in 
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indoor lighting situations from the same profile image angle. This database has been existed in a pre- 
processed form. In the pre-processing step, all ear images have been cropped tightly, all ear images are 
placed in the centre with mutually aligned, the dimensionality of all ear images are equal. Moreover, all left 
ear images have been mirrored in this database. Thus, the whole ear images seem to consist of the right ear 
images. The number of ear images in each class in the range of 3 to 6. In our work, all ear images (i.e., 793 
samples) have been used with respect to 221 classes. Besides, the database has been divided into 67% 
training and 33% testing for each class. 


Input Ear Images 


Features Extraction Step Classification Step Output 


= HOG = | i = J | Identify the Person 


Figure 2. The flowchart of the proposed system 


IITD II Dataset 


3.2. Histograms of oriented gradients (HOG) 

The HOG is dependent on the gradient directions accumulation through the image pixel for a 
particular area called "Cell". In the following construction for one-dimension histogram which provides a 
concatenation of features vector in order to be considered to feed for the classification process. Assume that 
G refers to the grayscale function that has been used for describing and analysing images. Furthermore, each 
image will be divided into a group of cells with a size of N x N pixels. Figure 3 shows the image dividing 
processes to a set of cells. The gradient orientation (i.e., Okr) for every pixel is calculated as shown in (1). 
Figure 3 (2) and (3) illustrates the gradient orientation processes. 
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Figure 3. Depicts HOG features extraction steps 


Moreover, the orientations 6? i= 1... N? for the same cell j are accumulated and quantized into an 
M-bins histogram as shown in Figure 3 (4) and (5). In the last step, the whole obtained histograms will be 
arranged and concatenated into HOG histogram as a final outcome of the feature extraction process as shown 
in Figure 3 (6). Figure 3 has reported an example of the cell size with four pixels and eight bins of orientation 
for the histograms of a cell. 


3.3. Random forest classifier 

In data sciences and machine learning, the algorithms of tree-based learning are one of the most 
extensively utilized. One of the tree based algorithms is the random forest (RF) [33] which builds by utilizing 
several decision trees. The ensemble method is the procedure of combining trees. From each tree, the 
classification is acquired for the vector that been treated as a vote for the class. The forest selects the high 
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voted classifier for the vector. It is an ensemble model classifier which is based on the method of divide- 
andconquer. Via this process, a group of individual learners with a weak ability can form a powerful learner 
together. Assume TS is the training set with F features. The representation of TS can be as (2). 


TS = {(X,, Ly), Xz L2), a (Xn, Ln)} (2) 


Where: X; = {xin Xi2, ..., Xir} refers to the vector that created via F feature values; and L; refers to the output 
class of the ith vector. 

Now, the z number of datasets (TS), TS2, ..., TS) is formed, and the size of each one of them is equal 
to the size of the training set. The selection of these datasets is performed through random sampling with the 
replacement, i.e., to create each dataset TS; (i = 1, 2, ..., z), n number of vectors are picked randomly from 
TS. A single selected vector (X; Li), can be re-utilized to create another dataset TS; where j + i. As long as the 
random sampling was done with the replacement, any vector ‘(Xi, Li)’ can be picked multiple times for a 
different 7S;, and there are some vectors which never picked for any TS;. That is called ‘bagging’ which is 
based on aggregation bootstrap. For each TS; a tree Z; is formed. The new input vector ‘V? is classified via 
passing it through z trees. Each tree is voting to a certain class for the new vector ‘V;’. The V; class decides 
based on the majority of vote. Figure 4 shows the structure of RF classifier. Table 1 provides the variables 
values of the RF where this study is using an ensemble with 150 bagged decision trees. 


The new input vector ‘Vi’ 


Table 1. Variables values of the RF 
Variable Name Variable Value 
Training TS [539x3780] 
oes Training L [539x1] 
Method Classification 
Number of ensemble bagged 150 


decision trees 
Decision Tree-1 Decision Tree-2 Decision Tree-z 


| | | Number of predictors 3780 
sali nies ie Number of predictors to sample 62 
Min leaf size 1 
| In bag fraction 1 
The Majority: of Vase Sample with replacement 1 
| Compute OOB prediction 0 
Final Resutt Compute OOB predictor 0 
importance 
Figure 4. The random forest diagram Proximity 0 


4. RESULTS AND DISCUSSION 

The proposed system is aimed to identify the users based on the ear images. The HOG technique is 
applied to extract features from ear images. Subsequently, these extracted features will be fed to the RF 
classifier for identifying the users through the ear images. In addition, this system is implemented by using 
MATLAB R2017a as a simulation tool through PC (Windows 10), Intel Core-i7, 3.20 GHz CPU, and 16 GB 
RAM. The performance of this proposed system for ear recognition has been evaluated by using many 
different measurements in order to evaluate its performance in terms of efficiency and effectiveness. These 
measurements are accuracy, specificity, precision, recall (sensitivity), f-measure, G-mean, and execution time 
(sec). These evaluation measures have computed as shown in (3)-(8) [34]-[36]. 


TP+TN 


ACCUT acy = oan + FP + FN (3) 
Specificity = —" (4) 
Precision = —-— (5) 
TP + FP 
Recall = —" (6) 
TP +FN 


2 x Precision X Recall 
F — measure = —————_ (7) 
Recall + Precision 


G — Mean = ‘Specificity x Recall (8) 
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Where: TP denotes to true positive, TN denotes to true negative, FP and FN refer to false positive and false 
negative, respectively. Based on the experimental results, the proposed system has achieved 99.69% 
accuracy, and 99.84% specificity. Furthermore, the proposed system has obtained 80.78% G-mean, and it has 
taken 81.44 seconds with respect to the execution time. However, the other measurements of precision, 
recall, and f-measure for the proposed system were all equal to 65.35%. According to the obtained results, 
the proposed system using the RF classifier is able to detect and recognize users by ear images effectively. 
Table 2 shows the achieved results for the proposed system in the ear recognition application. 

Moreover, the performance of the proposed system has compared with other methods that have 
presented in [37]-[39], where the method in [37] has used CNN classifier for recognizing ear images, while 
the method in [38] has used three different types of classifiers which are support vector machine (SVM) with 
radial basis functions (RBF) kernel, and SVM with linear kernel, and K-nearest neighbours (K-NN). The 
experimental results showed that the K-NN is achieved the best results. Moreover, the study in [39] has used 
SVM classifier. All these methods have used IITD II database for the ear images recognition which is the 
same ear images database that we have used in our method. The performance of the proposed system has been 
outperformed with all its comparatives with respect to the accuracy measure. Table 3 illustrates the comparison 
between methods in terms of the accuracy in the recognition of ear images. 


Table 2. The achieved results of the proposed system 
Accuracy Specificity Precision Recall F-Measure G-Mean Execution Time (s) 
99.69 99.84 65.35 65.35 65.35 80.78 81.44 


Table 3. The comparison between methods in the ear image recognition 


Methods Accuracy 
Our Method (RF) 99.69 % 
CNN [37] 97.36% 
K-NN [38] 97.63% 
SVM [39] 97.31% 


5. CONCLUSION 

Ear recognition systems based on biometric features have a high significance for identifying users 
depend on their ear images. Furthermore, these systems based on ear images have an important role in 
security applications. In this paper, we have presented an ear recognition system based on two well-known 
techniques which are HOG and RF. The HOG is used to extract features from ear images, and the RF is used 
as a classifier to identify ear images. The ear images have been taken from IITD II database, and the number 
of ear images is 793 with 221 classes. The performance results of the proposed system have shown that the 
accuracy has reached to 99.69%, 99.84% specificity, and 80.78% for G-mean. Further, the execution time of 
this system was 81.44 seconds. However, the achieved results of precision, recall, and f-measure were 
65.35%. According to the experimental results, the proposed system is able to identify the ear images 
efficiently. Also, it has obtained a good result as compared to other systems used in the ear images 
recognition. It worth mention, to the best of our knowledge this is the first study which has been used RF 
classifier in the ear images recognition field. In future work, we can apply these techniques by using 3D ear 
images as input data to the system. Furthermore, other machine learning classifiers and feature extraction 
techniques can be used in such systems. Another future work is to apply dimension reduction methods such 
as PCA on the extracted features. 
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